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Abstract 

We consider a multi-cell frequency-selective fading uplink channel (network MIMO) from K single-antenna user terminals 
(UTs) to B cooperative base stations (BSs) with M antennas each. The BSs, assumed to be oblivious of the applied codebooks, 
forward compressed versions of their observations to a central station (CS) via capacity limited backhaul links. The CS jointly 
decodes the messages from all UTs. Since the BSs and the CS are assumed to have no prior channel state information (CSI), the 
channel needs to be estimated during its coherence time. Based on a lower bound of the ergodic mutual information, we determine 
the optimal fraction of the coherence time used for channel training, taking different path losses between the UTs and the BSs 
into account. We then study how the optimal training length is impacted by the backhaul capacity. Although our analytical results 
are based on a large system limit, we show by simulations that they provide very accurate approximations for even small system 
dimensions. 

Index Terms 

Coordinated Multi-Point (CoMP), network MIMO, multi-cell processing, channel estimation, imperfect channel state infor- 
mation (CSI), random matrix theory. 

I. Introduction 

NETWORK MIMO has become the synonym for cooperative communications in the cellular context and is regarded as an 
important concept to boost the interference limited performance of today's cellular networks. It is often also referred to as 
multi-cell processing or distributed antenna systems and corresponds to a communication system where multiple base stations 
(BSs), connected via high speed backhaul links to a central station (CS), jointly process data either received over the uplink 
or transmitted over the downlink. If the BSs could cooperate without any restrictions with regards to the backhaul capacity, 
processing delay, computing complexity and the availability of channel state information (CSI), the multi-cell interference 
channel would be transformed into a multiple-access (uplink) or broadcast (downlink) channel without multi-cell interference. 
This argument motivated the concept of network MIMO and it has been shown in many works, e.g. [JJ, that BS-cooperation 
has the potential to realize significant gains in throughput and reliability. 

So far, the treatment of multi-cell cooperation in the literature has been either information-theoretic but limited to simple 
models ||2l, JS] or based on simulations to account for more realistic and complex network structures [4J, |5|, [6J. The most 
common and analytically tractable network models are the Wyner model Q, E) and the soft hand-off model ||9|, ifTOl which 
consider cooperation between either two or three adjacent BSs on an infinite linear or circular cellular array. Variants of both 
models have been studied under various assumptions on the transmission schemes and the fading characteristics. 

In practical systems, perfect BS-cooperation or global processing is very difficult, if not impossible, to achieve. The main 
limitations are threefold: (i) limited backhaul capacity, (ii) local connectivity and (iii) imperfect CSI at the CS and the BSs{^ 
Therefore, most of the recent research targets the problem of constrained cooperation. For a detailed overview of this topic we 
refer to the surveys ifTTl . IIT2I . Information-theoretic implications of limited backhaul capacity have been studied separately 
for the uplink and downlink in fTSl and fl4|. Recently, the optimal amount of user data sharing between the BSs for the 
downlink with linear beamforming and backhaul constraints was studied in |15|. The difficulties related to connecting a large 
number of BSs to a single CS have motivated the study of systems with only locally coimected BSs ifTOl , |fT6l , ifTTI . Several 
distributed algorithms for the uplink [18] and downlink |fT9]| , ll20l have been proposed and it was shown that even with local 
BS connection near-optimal performance can be achieved with a reasonable amount of message passing and computational 
complexity. 

One of the most critical limitations of a practical network MIMO system, somehow overlooked compared to (i) and (ii), arises 
from the substantial overhead related to the acquisition of CSI (iii), indispensable to achieve the full diversity or multiplexing 
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gains. This overhead becomes paramount, in particular for fast fading channels, when the number of antennas, sub-carriers, 
user terminals (UTs) or BSs grows 1271 . ||5l, IS], ll22]| . Usually, CSI for the uplink is acquired through pilot signals sent by the 
UTs. This implies that a part of the coherence time of the channel needs to be sacrificed to obtain CSI with a sufficiently high 
quality. The inherent tradeoff between the resources dedicated to channel estimation and data transmission has been studied for 
the point-to-point MIMO channel f23l, ^24] and the multi-user downlink |25|. Recently, this problem was also addressed in the 
context of network MIMO systems, although with a different focus. In [22 1, |5|, |6 |, the authors compare several multi-cellular 
system architectures and conclude that the downlink performance of network MIMO systems is mainly limited by the inevitable 
acquisition of CSI (rather than by limited backhaul capacity). They also demonstrate that a conventional cellular system might 
outperform a network MIMO system under some circumstances assuming that the number of coordinated antennas and the 
used training overhead for both systems are the same. This means in essence that simply installing more antennas per BS can 
lead to higher performance improvements than installing costly backhaul infrastructure. 

The imperfections detailed above call for robust strategies adapted to restricted BS-cooperation. Some schemes ll26l . ETll 
rely on local CSI at the BSs and statistical CSI at the CS, whereas others [28 1, [4] consider serving only certain subsets of UTs 
with multiple BSs. Several BS-cooperation schemes have been studied in |29|, |30| for the combination of limited backhaul 
capacity and imperfect CSI. The problem of "pilot contamination" caused by non-orthogonal training sequences in adjacent 
cells which can lead to significant inter-cell interference was addressed in 131] and an optimized multi-cell precoding technique 
has been proposed. 

In this paper, we also consider limited BS-cooperation by focusing especially on the effects of imperfect CSI (iii). More 
precisely, we study the performance of the multi-cell uplink with partially restricted cooperation assuming that: 

• The BSs act as oblivious relays which forward compressed versions of their received signals to the CS via orthogonal 
error- and delay-free backhaul links, each of fixed capacity C bits/channel use. 

• The CS estimates the channel based on pilot tones sent by the UTs. 

• The CS jointly processes the received signals from all BSs. 

We consider a lower bound of the normalized ergodic mutual information of the network MIMO uplink channel with imperfect 
CSI and limited backhaul capacity, called the net ergodic achievable rate -Rnet(T). For a given channel coherence time T, we 
attempt to find the optimal length t* of the pilot sequences for channel training which maximizes i?net(T)- As this optimization 
problem is in general intractable, we study a deterministic approximation i?net(''') of R^tiT), based on large random matrix 
theory. 

The main contribution of this work is to show that optimizing i?net(''') instead of -Rnet(2l) is optimal in the large system limit. 
To this end, we provide a closed-form expression of the derivative of i?net(''') (Theorem pi, prove the concavity of i?net(T) for 
channel matrices with a doubly regular variance profile (Theorem [3]), and show that r* which maximizes i?net(''') converges 
to r* in the large system limit (Theorem |4]i. We further demonstrate by simulations that our asymptotic results yield tight 
approximations for systems of small dimensions with as little as three BSs and UTs. In addition, we study the effects of limited 
backhaul capacity on the optimal channel training length. Since we assume that the CS estimates all channels based on the 
compressed observations from the BSs, the channel estimates are impaired by thermal noise and quantization errors. Thus, 
increasing the backhaul capacity leads to improved channel estimates and, hence, smaller values of r*. 

The determination of the optimal training length r* in an uplink network MIMO setting with arbitrary path loss between 
the UTs and BSs and limited backhaul capacity appears to be a novel result, although we limit our investigation to a simple 
setting where B cooperative BSs do not suffer from interference outside the network. The extension of this work to more 
realistic networks, such as clustered systems, is left to future investigations. Although the use of random matrix theory in the 
context of network MIMO is not new, see e.g. Il32l . If33l . we present a novel application to an optimization problem in wireless 
communications. 

The paper is structured as follows. The system model, including compression, channel training and data transmission, is 
described in Section |ll] The net ergodic achievable rate i?net(''') is defined in Section III where we also present the deterministic 
approximation Rnet{T) and discuss the optimization of the training length r. Numerical results and concluding remarks are 
given in Sections [IV] and [V] respectively. 

Notations: Boldface lowercase and uppercase letters designate column vectors and matrices, respectively. For a matrix X, 
Xij or [X] denotes the entry of X, |X| and trX denote the determinant and trace and X^ and X'^ denote the transpose 
and complex conjugate transpose. For two matrices X and Y, X ig) Y denotes the Kronecker (tensor) product. We denote 
an identity matrix of size M as 1^/ and diag(a;i, . . . ,xm) is a diagonal matrix of size M with the elements Xi on its main 
diagonal. We use x ~ CAf (m, R) to state that the vector x has a circular symmetric complex Gaussian distribution with mean 
m and covariance matrix R. The natural logarithm is denoted by log( ). 
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Fig. 1. Schematic system model for M = 2 antennas per BS. The BSs compress and forward their received signals to the CS via orthogonal backhaul links 
of capacity C bits/channel use. The CS jointly processes the received data from all BSs. 



II. System Model 

A. Channel Model 

We consider a multi-cell frequency-selective fading uplink channel from K single-antenna UTs to B BSs with M antennas 
each|^A schematic diagram of the channel model for M = 2 is given in Fig.[l] Communication takes place simultaneously from 
all UTs to all BSs on L parallel sub-carriers assuming an orthogonal frequency-division multiplexing (OFDM) transmission 
scheme. The stacked receive vector of all BSs on the Ah sub-carrier y(£) = . . . ,2/sa/(^)]^ € C^*^ at a given time 

reads 

y(£) = H(^)x(^) + n(^) (1) 

where x(^) = [xi{i), . . . ,XK{i)Y ^ is the vector of the transmitted signals of all UTs on sub-carrier I, n{l) ^ 
CA/'(0,Ibm) is a vector of additive noise and H(^) e C"'"''" is the aggregated channel matrix from all UTs to all BSs on 
the fth sub-carrier 

We consider a discrete-time block-fading channel model where the channel remains constant for a coherence block of T 
channel uses and then changes randomly from one block to the other. We let T — TcWc, where Wc is the bandwidth per 
sub-carrier in Hz and Tc the channel coherence time in seconds. Presuming that the bandwidth of each sub-carrier Wc is on 
the order of the channel coherence bandwidth, that the antenna spacing at the BSs is sufficiently large and that the channels 
from the UTs to the BSs are uncorrected, the channel matrices Hf,(^) e C^^^^, b — I, . . . ,B, from the UTs to the BSs can 
be modeled as 

Hfe(£) = Wfc(£) diag (7^, . . . , (2) 

where Wb(^) G {^^xk ^ standard complex Gaussian matrix and ai,k denotes the inverse path loss between UT k and BS 
&|^For later use, we define the matrix V e R^*^^^ in the following way: 

V = A ® 1m (3) 

where A e M^^^ is the inverse path loss matrix with elements {abk} and 1m is a -dimensional column vector with all 
entries equal to one, such that the elements {vij} of V satisfy Vij = a^My. Under these assumptions, the elements {hij{£)} 
of the matrix H(£) are independent circular symmetric complex Gaussian random variables with zero mean and variance Vij, 
i.e., hij{£) ^ CJV{0,Vij). We refer to V as the variance profile of the channel matrix H(Z) and assume in the sequel that 
V is perfectly known at the CS while each BS b only knows the distribution of its local channels Hb(£), ^ = 1, . . . , i. In a 
practical system, the channel coherence bandwidth might be significantly larger than the bandwidth of a sub-carrier so that 
{hij{£)} would exhibit some correlation with respect to £. From a channel estimation perspective, the assumption of i.i.d. 
channel coefficients represents a worst case since sub-carrier correlation cannot be exploited in the estimation process. 

For simplicity, we assume Gaussian signaling with uniform power allocation, i.e., Xk{i) ~ CJ\f{0, P/L), i.i.d. over I and fc, 
which is not necessarily optimal in the presence of channel estimation errors ll34l . ll23l . Although optimal power allocation 

^Our results can be easily extended to the case where each BS has a different number of antennas. 

'Note that the path loss is independent of the sub-carrier index I. This might not be the case for extremely large bandwidth but it is a reasonable assumption 
for most practical scenarios. 
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over the sub-carriers would provide significant gains, it would require perfect channel knowledge at the UTs or some sort of 
feedback from the BSs/CS. Since we assume neither feedback nor CSI at the UTs and since the channel statistics are the same 
for all sub-carriers, uniform power allocation seems to be a reasonable choice. 



B. Compression at the BSs 

The BSs are assumed to be oblivious to the applied codebooks of the UTs and forward compressed versions y'i{Q of 
their received signal sequences yi{t) to the CS via orthogonal backhaul links, each of capacity C bits per channel usenWe 
also assume that the BSs and the CS have no prior knowledge of the instantaneous channel reaUzations. Under this setting, 
we consider a simple, sub-optimal compression scheme which neither exploits correlations between the received signals at 
different antennas nor adapts the employed quantization codebook to the actual channel realization. Thus, a single quantization 
codebook for the compression of each sequence yi{() is used. This is in contrast to existing works, e.g. ||35l . which rely on the 
assumption of full CSI at the BSs and the CS to apply optimized and channel dependent compression schemes. For a detailed 
discussion of different (distributed) compression schemes, we refer to [35 1, [36J, [30| and references therein. 

The rate-distortion function for the source yi{t) with squared error distortion is given as 1371 Theorem 10.2.1] 

RD{cjKi)) = , min I {y[{l)-y,{l)) (4) 

where the minimization is over all conditional probability density functions fy'(e)\yi(e) satisfying the expected distortion 
constraint crf{£). Similar to the so-called "elementary compression scheme" in |35|, our compression scheme is based on 
an underlying complex Gaussian "test channel" defined by 

ylii) ^ y^ii) + q^ie) (5) 

where qi{£) ^ CJ\f{0, crf{£)). Note that the test channel (|5]l used for the generation of the quantization codebooks is not optimal 
since the distribution of yi{£) = X^jLi hij{l)xj{t) +ni{l) is not Gaussian. However, one can argue that in a large system with 
many UTs, the random variable yi{t) is almost Gaussian distributed and the performance degradation due to the sub-optimal 
choice of fy'{i)\yi{e) is small. A simple upper bound of the rate distortion function is given by 

iiy'MyM)-h{y'S))~hiy'S)Me)) 

<log (Tre {E[\ym'] 
-log (TreaK^)) 

where the inequality is obtained by upper-bounding the entropy of y'^ {£) by the entropy of a complex Gaussian random variable 
with the same variance. We assume further that each BS uses C/{ML) bits for the compression of each received complex 
symbol per antenna per sub-carrier. Replacing the left-hand side (LHS) of (|6| by C /{ML), we can consequently overestimate 
the quantization noise variance cr^(£) by choosing 

= W = 7^ ; • (7) 

2 ml — 1 

Since the statistical distribution of yi{£) is the same for all sub-carriers, the quantization noise power af is also independent 
of £. One can easily verify that the quantization noise vanishes for infinite backhaul capacity, i.e., cr| — > for C — > oo, and 
grows without bounds when the backhaul has zero capacity, i.e., af — > oo for C — ?> 0. 

We would like to point out that the field of distributed compression with imperfect CSI is to the best of our knowledge 
a largely unexplored area. It is for example not clear if each BS should estimate its local channels and forward compressed 
versions of its estimates to the CS or if the CS should estimate all channels based on compressed signals from the BSs, as 
assumed in this work. 



C. Channel Training 

Similar to ||231 . each channel coherence block of length T is split into a phase for channel training and a phase for data 
transmission. During the training phase of length r, all K UTs broadcast orthogonal sequences of known pilot symbols of 

'^By orthogonal backhaul links we mean here that there is no inter-backhaul interference. This is for example the case for a wired backhaul network with 
a dedicated link between the CS and each BS. 
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equal power P/L on all sub-carriers. The orthogonality of the training sequences imposes t > K. We assume that the CS 
estimates the channels hij{£) from all UTs to all BSs based on the observations 



(8) 



where Sij{£) ^ CJ\f{0, 1 + af) captures the effects of the thermal noise at the BS-antennas and the quantization error on the 



backhaul links. For details on how the scalar estimation channel (|8]l is obtained, we refer the reader to 11231 . It becomes clear 
from the last equation that the quantization noise degrades the channel estimate. Thus, the backhaul capacity C has a significant 
influence on the optimal training length t*. This point will be further discussed in Section |IV] Computing the minimum mean 
square error (MMSE) estimate of hij{£) given the observation rij{£), we can decompose hij{£) into the estimate hij{£) and 
the independent estimation error hij{£), such that 



h^J{£)^h,,{£) + h,J{e). 

The variance of the estimated channel Vijir) and the variance of the estimation error Vij{T) are respectively given as 



Vijir) = E \hij{£)\ 



p 1 



try + 1 + CT,^ 



\l£. 



(9) 

(10) 
(11) 



Denote V(r) and V(r) the variance profiles of the estimated channel H(^) and the estimation error H(i'), respectively. One 
can easily verify that the total energy of the channel is conserved since 



V = V(r) + V(t) 



(12) 



D. Data Transmission 

In each channel coherence block, the UTs broadcast their data simultaneously during T — t channel uses. The CS jointly 
decodes the messages from all UTs, leveraging the previously computed channel estimate H(i?). With the knowledge of 
H(^), the CS "sees" in its received signal y'(^) = [yj^ (i?) , . . . , J/bm(^)]^ useful term H(^)x(^) and the overall noise term 
z(£) = H(^)x(^) + n{£) + q(£), i.e., 

y'W=H(£)x(£)+z(£) (13) 

where the quantization noise vector q = [qi{l), . . . .qsM {£)] ' is defined by ^. Since the statistical distributions of all sub- 
carriers, signals and noise are i.i.d. with respect to the index £, we will hereafter omit the dependence on £ and consider a 
single isolated sub-carrier. 



III. Net Ergodic Achievable Rate 
The capacity of the channel ( [TJI l is not explicitly known. We consider therefore a lower bound of the normalized ergodic 
mutual information -j^I ^y';x|Hj, referred to hereafter as the ergodic achievable rate R{t). This lower bound is in essence 
obtained by overestimating the detrimental effect of the estimation error, treating the total noise term z as independent complex 
Gaussian noise with covariance matrix K2(r) e R;^*^^^*^, given as 



K,(t)=e[zzH] 



diag 1 + at 



P 



K 



BM 



J = l 



Thus, the ergodic achievable rate can be written as ll34l . Il23l 



where we have defined the effective channel H(r) as 



log 



Ibm + -H(T)H(r)^ 



(14) 



(15) 



H(t) = K,^(t)H. 



(16) 
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Note that the ergodic achievable rate does not account for the fact that only a fraction (1 — t/T) of the total coherence 
block length can be used for data transmission. Our goal is thus to find the optimal training length r*, maximizing the net 
ergodic achievable rate 

i?ne,(r) = (i-^)r{t). (17) 

Here, the difficulty consists in computing the ergodic achievable rate R{t) explicitly. Since a closed-form expression of R{t) 
for finite dimensions of the channel matrix H seems intractable, we resort to an approximation based on the theory of large 
random matrices. We will demonstrate shortly that this approximation, although only asymptotically tight, yields very close 
approximations for even small values of B, M, K and L. 



A. Deterministic Equivalent 

In this section, we present a deterministic equivalent approximation i?(r) of R{t) in the large system limit, i.e., for 
BM, L — oo at the same speed. Denote N = BM the product of the number of BSs and the number of antennas 
per BS. The notation K ^ oo will refer in the sequel to the following two conditions on K, N and L: 

. . N . N 

< lim inf — < lim sup — < oo 

A'^oo K K-foo K 

< lim inf ^ < lim sup ^ < oo. (18) 



Define V(t) — ^(r)V(T) the variance profile of the effective channel H(r) with elements 

and consider the following N x N matrices 

Bj{T)^dmg{vi,{T),...,VNj{T)), j = l,...,K. (20) 

Denote by C+ = {z E C : Im(z) > 0}, and by S the class of functions / analytic over C \ M+, such that for z E C+, 
f{z) € C+ and zf{z) G C+, and limj,_j.oo ~'wf{w) = 1> where i = ^/^|^ We are now in position to state the deterministic 
approximation R{t) of R{t) based on a direct application of Ii39i Theorem 2.3] (see also [38, Theorems 2.4 and 4.1]) to our 
channel model. 

Theorem 1 (Deterministic Equivalent): Let r > 0. Assume that K, N and L satisfy ([TSj and < Vij{T) < Wmax < ooVi, j. 
Then: 

(i) The following implicit equation: 



-1 



j=l ' K^^J 



(21) 



admits a unique solution T(z) = diag {ti{z), . . . ,%(z)) such that (^1(2:), . . . ,fjv(z)) € S 

L 
KP 

K 



N 



(ii) Let P > 0. Denote Tp — T{~j^) and consider the quantity: 



TV ^ ° V K 
-llogdet(A^Tp 



Then, the following holds true: 



1 ^ ^trD,(r)Tp 
A^^^l + itrD,(r)Tp- 



R{t) - R{t) > 0. (23) 



^Such functions ai'e known to be Stieltjes transfoiTns of probability measures over R_|_ - see for instance [38; Proposition 2.2]. 
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B. Optimization of the training length t 

In this section, we consider the optimization of the training length r with the goal of maximizing the net ergodic achievable 
rate i?net (''')• In order to find the optimal training length r* for a given coherence block length T, we wish to solve the following 
optimization problem: 



maximize 
subject to 



i?net(r) 

K <T <T. 



(24) 



As this optimization problem is intractable for finite dimensions, we pursue the following approach: 

1) We find T* maximizing the deterministic approximation i?net(''') = (l ^ 5^) ^(''')- 

2) We show that i?net(T*) - Rn^iij*) -> and T* - r* ^ as i<r ^ 00. 

3) We verify by simulations that r* is very close to r* for even small values of K, N and L. 
We start by establishing the concavity of Rast{T), our new objective function. Denota^' 



where 



and define the matrices 



v^j{r) = -v-Jt) 



L ^i] 



(1 



L "^0> 



D;. (r ) = diag (tJ'i^. (r) , . . . , v'^^ (r)) , j = I, 



(25) 



(26) 



(27) 



A simple composition rule ||40l Exercise 3.32 (b)] states that the product of a positive decreasing linear function and a positive 
increasing concave function is also concave. In order to prove the concavity of R„et{T) = (1 — ^)i?(r), it is thus sufficient 
to show that R{t) is an increasing concave function in t. A sufficient condition for concavity is R (r) < 0. We begin 
by considering the first derivative R (r), which allows for a simple concise closed-from expression as provided by the next 
theorem: 

Theorem 2 (Derivative): Under the same conditions as for Theorem [l] the first derivative of R{t) permits the explicit 
expression 

^^_itrD;.(r)Tp 

N ■ 



^^^^ -^§i + iti-D,(r)Tp 



(28) 



where Tp 



is given by Theorem [r|(i). Moreover, for any P, r > 0, R{t) is an increasing function, i.e., 

r'{t) > 0. (29) 



Proof: See Appendix [A| ■ 
Despite the simplicity of the expression of r\t) in Theorem [2] it seems intractable to show that R^f-iir) < for channel 
matrices with a general variance profile. This is due to the fact that not only Dj(T) depends on r, but also Tp. The matrix 
Tp is in general given as the solution of an implicit equation which can only be determined numerically, e.g. by a fixed-point 
algorithm. It is thus difficult to infer the behavior of Tp with respect to r. However, one can show for the particular case of 
a doubly regular variance profile that R{t) is indeed concave. 

Theorem 3 (Concavity): Let P, r > 0. Assume that N ^ K and that V(t) is a doubly regular matrix which satisfies the 
following regularity condition: 



/C(r) 



1 



N 



N ^ 

i=l 



Vikir) 



1 ^ 
TV ^ 



(30) 



Then, R{t) is a strictly concave function. 

Proof: See Appendix [B] ■ 
Remark 3.1: Based on our simulation results, we conjecture that Theorem |3] also holds for non doubly regular variance 

profiles V(r). Intuitively, R{t) being a concave function means nothing else than that channel training shows diminishing 



*We use f'{x) to denote the first derivative of the function f{x), i.e., f'{x) = '^^^f^- 
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returns. That is, the marginal benefit of each training symbol decreases until the channel estimation becomes nearly perfect. 
The previous argument can be made clear considering the two extreme cases t = and r — ^ cxi. One can easily verify that 
D, (0) — while D' (0) > 0. This implies R (0) > 0, i.e., channel training increases the ergodic achievable rate. On the 

other hand, as r — > oo, D' (r) — > 0, so that also R (r) — > 0, i.e., the marginal benefit of channel training vanishes. It is thus 

■' I 

justified to conjecture that R (r) is a decreasi ng fu nction of r and hence R{t) a concave function. 

As a consequence of Theorem [s] and Remark 3.1 we assume that Rnetij) takes its global maximum in (0, T] and the optimal 



training length r* can be determined as the solution of 

RUr)^{l-^)R'{T)-^R{r)^0. (31) 

The value r* can now be easily found, e.g. via the bisection method. It remains to show that the optimal training length 
T* which maximizes i?net(T) is asymptotically optimal for the original objective function -Rnet(T). This is done in the next 
theorem. 

Theorem 4 (Convergence): Let r* = arg max^-gjo^T] -Rnet(''') and t* = arg max.rG[o,T] Rn<tt{T). Then, under the same condi- 
tions as for Theorem [T| the following holds true: 

(i) 

i?„et(r*) - i?net(r*) > 0. (32) 

K^oo 

(ii) Further assume that V(t) is a doubly regular matrix which satisfies the conditions of Theorem [3] Then, 

T* - T* > (33) 

where r* is given as the solution to 



<et(T) = (1 - ^) ^V) - }^Rir) = (34) 
with R{t) and R (r) given by Theorem 111 (ii) and Theorem El respectively. 

Proof: See Appendix [C] ■ 
Theorem [4] (i) merely states that the maximum point of i?net(''') can be arbitrarily closely approximated by the maximum 
point of _Rnet(T). This result is independent of the structure of the variance profile V(r). Theorem |4] (ii) provides a simple way 
to compute r* and states that this value is also asymptotically optimal for RmiiT). However, this result requires V(t) to be 
a doubly regular matrix. Both results together imply that optimizing i?net(''') is asymptotically identical to optimizing i?net(''')- 
We show in the next section via simulations that Theorem [3] and Theorem |4] also hold for non doubly regular variance profiles. 

IV. Numerical Results 

In order to show the validity of our analysis in the preceding sections, we consider a simple cellular system consisting 
of S = 3 BSs with M — 2 antennas and K = 3 UTs, as shown in Fig. |2] The locations of the UTs are randomly chosen 
according to a uniform distribution. The inverse path loss factor a^k between UT k and BS b is given as aj^k = d^k '^^ where 
is the distance between UT k and BS b, normalized to the maximum distance within a cell. We consider one random 
snapshot of user distributions, resulting in the inverse path loss matrix 

/2.9775 0.0385 1.6055\ 
A = 0.2512 2.7826 0.1759 . (35) 
\0.0615 0.0492 1.6376/ 

In the sequel, we assume A fixed while we average over many independent realizations of the channel matrix H. The cell edge 
signal-to-noise-ratio is defined as SNR = E /E = P/L. Unless otherwise stated, we assume T = 1000 

and L = 1. 

Fig. [3] depicts the net ergodic achievable rate i?net(''') and its deterministic equivalent approximation i?net(''') by Theo- 
rem [T| (ii) as a function of the SNR for a fixed training length of r = 40 and different values of the backhaul capacity 
C = {1, 5, 10} bits/channel use. Clearly, i?net(''') gives a very tight approximation of RmtiT) over the full range of SNR. The 
effect of limited backhaul is particularly visible at high SNR where all curves saturate. 

For the same set of parameters and SNR = OdB, we show in Fig.j4]i?net(T) and i?net(T) as a function of the training length 
T. This plot validates Theorem [3] and the corresponding remark as Rnetir) is obviously a concave function. Moreover, since 
the curves of i?net(''') and Rnet{T) match very closely, it is reasonable to assume that both take a similar maximum value at a 
similar value of r. The validity of Theorem |4] is demonstrated in Fig. |5] which shows the optimal training length t*, found by 
an exhaustive search based on Monte Carlo simulations, and the training length t* which maximizes i?net(''') as a function of 
the SNR for C ~ 1 bits/channel use and T — 100. The differences between both values, although very small, are mainly due 
to the exhaustive search over a necessarily discrete set of values of r. 
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Fig. 2. Cellular example with B = 3 BSs and K = 3 UTs. 
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Fig. 3. Net ergodic achievable rate i?net(T) vs SNR for r = 40 and T = 1000. The markers are obtained by simulations, the solid lines correspond to the 
deterministic equivalent ilnet(T). 

Fig. |6]shows the dependence of the optimal training length t* on the backhaul capacity C for a fixed SNR = 10 dB. One can 
see that t* is a decreasing function of C which converges quickly to particular value corresponding to infinite capacity backhaul 
links. The reason for this is the following. The CS estimates the channel coefficients based on the quantized training signals 
received by the BSs. The channel estimate is hence impaired by thermal noise and quantization errors. Therefore, increasing C 
results in better channel estimates and reduces the necessary training length. For infinite backhaul capacity, the optimal training 
length is only dependent on the SNR. In a similar flavor. Fig. |7] depicts RmtiT*) as a function of the backhaul capacity C. We 
notice the inefficient utilization of the backhaul links due to sub-optimal compression since the net ergodic achievable rate per 
BS, i.e., M X Rnet{T*), is much lower than the necessary backhaul capacity. For example, it takes C — 20 bits/channel use of 
backhaul capacity to achieve a rate per BS of 2 x i?net(7^*) ~ 5.2 bits/channel use. 

V. Conclusion 

In this work, we have considered a frequency-selective fading network MIMO uplink channel with arbitrary path losses 
between the UTs and BSs and finite capacity backhaul links. Using a close approximation of the net ergodic achievable rate 
based on random matrix theory, we have studied the optimal tradeoff between the resources used for channel training and data 
transmission. Although the asymptotic results are proved to be tight only in the large system limit, our numerical examples show 
that they provide close approximations even for small system dimensions. Our results also show that limited backhaul capacity 
has a significant impact on the optimal training length. We wish to conclude the paper by pointing out some shortcomings of 
our system model which remain as future investigations. 

1 ) Backhaul links and cooperation: A relevant question is how a BS should decide whether to cooperate by forwarding its 
received data to some central processor or to process its received signals alone. In our model, the net throughput vanishes with 
a decreasing backhaul capacity although each BSs could theoretically decode a part of the received messages alone. Future 
work, also motivated by the recent results in 1301 . flTI . comprises the investigation of flexible schemes which adapt the degree 
of cooperation according to some statistical side-information about the channels, backhaul limitations, quality of CSI, etc. 
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Fig. 4. Net ergodic achievable rate -Rnct(T) vs training length r for SNR = OdB and T = 1000. The markers are obtained by simulations, the solid lines 
correspond to the deterministic equivalent i?net(''")- 




Fig. 5. Optimal training length r* and t* vs SNR for C = 1 bits/channel use and T = 100. The soUd line corresponds to t* maximizing RnstiT), the 
dashed line corresponds to r* maximizing i?iict(T) and is obtained by an exhaustive search based on Monte Carlo simulations. 




Fig. 6. Optimal training length r* vs backhaul capacity C for SNR = 10 dB and T = 1000. 
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Fig. 7. Net ergodic achievable rate iJnet(T*) with optimal channel training t* vs backhaul capacity C for SNR = 10 dB and T = 1000. 

2) Inter-cluster interference: We have considered a multi-cell network composed of B cooperative cells without inter-cell 
interference. In a real system, also the effects of non-orthogonal training sequences leading to "pilot contamination" |3ll, 
||21 1 constitute an important issue for practical system design. Both aspects need to be taken into account for a more realistic 
performance evaluation of network MIMO systems. 



Appendix A 
Proof of Theorem|2] 

We start by defining the following auxiliary variables 5j = -^trDj(T)Tp, j = 1,...^K. Using this definition, we can 
re-write R{t) in ( p2] i as 

K 



1 



log(l + ,5,) 



1 + 5, 



\KP 



logdet ( ^^Tp 



(36) 



We define 5'^ 
yields 



d 5j 



;^trD^(r)Tp + itrDj(T)T^, where = iV^P- Taking the derivative of R{t) with respect to t 



N 



.trTp^T^. 



(37) 



This expression can be further simplified by re-writing the definition of Tp as a function of Sj: 

-1 

L 



KP^"" ■ K 




(38) 



Using this expression, we have 



trTp^T^ 



K ^ 1 



+ <5, 



-trTp|lf]^+'^^'^°'^^^"'^'°^'^^^ 



K 



^^-(l + 5,)itrD;.(r)Tp 



(1 + -^.)^ 



(39) 
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Plugging this expression into ( (37] i and replacing Sj by -^trDj(r)Tp leads to 



-' 1 ^ itrD'(r)Tp 



In 11391 Proposition 5.3], it is proved that 



/ L KP 

+ maxiJ,,(r) j < [Tp],, < — . (41) 

Since both Vij{T) and w-j(t) are positive for r, P > 0, it follows from (|4T) that itrD^(r)Tp > and j^tx'Dj{T)Tp > 0. 
This implies i? (r) > which concludes the proof. 

Appendix B 
Proof of Theorem[3] 

We want to show that R (t) < 0. Under the assumption of a doubly regular variance profile matrix V(t), the implicit 
matrix equation T{z) ( |2T| of Theorem [T] (i) reduces to a scalar equation, such that T{z) = t{z)lN, where 

tiz) = . (42) 

^ + l+K{T)t{z) 

The unique solution to this equation (such that t{z) E S) can be given in closed-form as 



2ICir) ■ ^^^^ 



Let ip = ^{^Tip)- Theorem|2j the first derivative of R{t) can be written as 



N j^^l + j^ir-D.irYp l + ip/C(r) ^^^^ 



where JC'{t) ~ ^/C(t). The second derivative is given as 



. t'pK.'{T) + tp/C"(r)[l + ^p/C(r)] - [^p/C'(t)]^ 

= [l + tp^(r)]2 ■ ^^^^ 



We now need to verify that the numerator of the last equation is negative. One can easily verify from ( [25| l and ( [26) that 
/C'(r) > and it follows from (|4T]i that tp > 0. It remains to check that t'p < and JC"{t) < 0. Write therefore tp as 



^l + ^/C(r)-l KP 

*P = ^>Fr-\ = / / r (46) 

2l(^1 + ^^(t) + 

which is a strictly decreasing function of t since JC'{t) > 0. Hence, we have that t'p < 0. In order to show that /C"(t) < 0, 
define the two auxiliary functions IC{t) — ^iLi ^iji"^) ™d K{t) — X^iLi ^■iji''') which are independent of the column 
index j. It is a simple exercise to verify that Vijir) are positive increasing concave functions and Vij{T) are positive decreasing 
convex functions. Due to the regularity conditions of the variance profile, one can verify from (|7} that the quantization noise 
cr? is the same for all BS-antennas, i.e., ct, = a^. Thus, 



N N ... 

1^ u,,{r) 



- (47) 



Since both K;(t) and (1 + ct^ + ^^/C(r)) ^ are positive increasing concave functions, it follows from BOl Exercise 3.32 (b)] 
that the same holds also for their product. Hence, JC" {t) < and, thus, R (t) < 0. 
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Appendix C 
Proof of Theorem|4] 

We expand the difference i?net('''*) — Rnet{T*) as follows: 

i?net(T*) - i?net(T*) - [i?„et(T*) - i?„et(r*)] 
+ [i?net(r*)-i?„e,(T*)] 

+ [i?net(T*)-i?net(r*)] . (48) 

From Theorem [T] (ii), we have that the first and last term of the right-hand side (RHS) of ( |48| ) vanish asymptotically, i.e., 

i?net(r*)-i?net(T*) ^0 (49) 

K^oo 

i?net {t* ) - i?„et (t* ) > 0. (50) 

K~^oo 



By the definition of r* and r*, we have for the LHS of ( [48] t and the second term on the RHS of < |48l ) 

i?ne,(r*) - i?„e,(r*) > 0, R„,,{t*) - i?„e,(T*) < 0. (51) 

Equations ( |48] l, (|49|, ( |50l ), and ( |5T] i together imply that 

i?net(T*)-i?net(T*) ^0 (52) 

K— J-OO 

i?ne,(r*) - i?net(T*) ^ 0. (53) 

K^oo 

Equation ( |52| l together with Theorem hi (ii) proofs the first part of the theorem. Assume now that V(r) is a doubly regular 
matrix. Since i?net(7') is by Theorem BTa strictly concave function which takes its unique maximum at point t*, ( |53| ) implies 
that T* - T* -> as A' oo. 
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